Machine translation - a view from the lexicon

نویسنده

  • Bonnie J. Dorr
چکیده

Books describing novel approaches to machine translation (MT) are always welcome. This is all the more so when the approach is one not covered by general MT surveys such as those in Hutchins and Somers (1992) or Arnold et al. (1994). Bonnie Jean Dorr's Machine Translation: A View from the Lexicon is a book with a novel approach. It describes the interlingual MT system UNITRAN rooted in two Massachusetts-based frameworks of theoretical linguistics: Chomskyan principles-and-parameters government-binding (GB) theory for the syntactic component and Jackendovian lexical conceptual structure (LCS) for the lexical-semantic component, which also serves as the interlingua. The main claim is that cross-linguistic lexical-semantic divergences between source and target languages (at least across basic English, Spanish, and German) are of roughly only seven types, thus leading to a simple systematic translation mapping (relating the interlingua to the corresponding syntactic structures) parameterized by switches, with no language-specific rules. Besides introductions, conclusions, and appendices, the book is organized into three parts encompassing UNITRAN's syntactic component, its lexical-semantic component, and application of the model. Chapter 1 is an introduction to the book. It briefly describes the basics of MT, including alternative approaches, and attempts to justify Dorr's parameterized interlingual principle-based design. It also begins a preliminary discussion of translation divergences, such as the lexical-semantic categorial type as in the English I am hungry, in which the predicate hungry is adjectival, compared with the German Ich habe Hunger ('I have hunger'), in which the corresponding Hunger is nominal. Chapters 2 and 3 form the part dealing with the syntactic component. The former discusses the implementation of GB modules coupled with the parameters particular for each of English, Spanish, and German. The latter deals with the two-level morphological processor used in UNITRAN for analysis and generation. Chapter 4, the first in the part dealing with the lexical-semantic component, is to a large extent a variant of Dorr (1993a). It describes the interlingual representation of UNITRAN. The chosen interlingua is an extended version of LCS, used also as a representation of lexical entries. Dorr justifies this choice as follows:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Lexicon-based Debates on the Felicity of Lexical Equivalents in Translating Literary Texts by Iranian EFL Learners

This study was an attempt to investigate the effect of lexicon-based debates on the felicity of lexical equivalents in translating literary texts by Iranian EFL learners.  To fulfill the purpose of this study, 59 university students, majoring in English Translation, were randomly assigned to the experimental and control groups from a total of 73 students based on their performance on a mock TOE...

متن کامل

Defining the Lexical Component in Interlinguas

As discussed by Dorr and Voss (1993, 1994), machine translation (MT) theory has not addressed the issues urrounding how the interlingua (IL) of a MT system should be defined or evaluated. This has a direct bearing on the decisions developers make with respect o the construction f a lexicon for MT. We view the IL as two distinct components: the declarative portion, which we call the "Lexical Com...

متن کامل

Dealing with Replicative Words in Hindi for Machine Translation to English

The South Asian languages are well-known for their replicative words. In these languages, words of almost all the grammatical categories can occur in their reduplicative form. Hindi is one such language which is quite rich in having various types of replicative words in its lexicon. The traditional grammars and some of the research works have discussed the topic to some extent, particularly fro...

متن کامل

Machine Translation between Language Stages: Extracting Historical Grammar from a Parallel Diachronic Corpus of Polish

This paper explores methods for the extrapolation of correspondences in a small parallel diachronic corpus taken from the Modern and Middle Polish Bible, in an attempt to answer the question “can historical grammar and lexica be derived directly from a corpus?” The problem of extracting this data is approached from a machine translation point of view: by envisioning texts from different periods...

متن کامل

Evaluating Resources for Query Translation in Cross-Language Information Retrieval

Our goal is to evaluate the utility of a lexical resource containing Lexical Conceptual Structures LCS for use in cross language information retrieval Our evaluation makes use of a combination of techniques from interlingual machine translation Dorr with conventional information retrieval techniques Oard OardandDorr Given a query in one language we transform the query into the corresponding ter...

متن کامل

Improving the Performance of an Example-Based Machine Translation System Using a Domain-specific Bilingual Lexicon

In this paper, we study the impact of using a domain-specific bilingual lexicon on the performance of an Example-Based Machine Translation system. We conducted experiments for the EnglishFrench language pair on in-domain texts from Europarl (European Parliament Proceedings) and out-of-domain texts from Emea (European Medicines Agency Documents), and we compared the results of the Example-Based ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993